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Roger Fang, Sanna Tuladhar 

May 2006 Journal of Computing Sciences in Colleges, volume 21 issue 5 
Publisher: Consortium for Computing Sciences in Colleges 

Full text available: glpdfOIB. SS K B) Additional Information: f u ll c itation, abs tr a ct, references . Index t er ms 



Data warehousing and data mining are technologies that deliver critical and optimally 
useful information to facilitate performance analysis of business organizations. These 
technologies are not only an emerging trend in information technology but also a booming 
market in a range of industries. In light of this continuously growing demand, schools are 
accelerating to prepare students with these technologies. This paper describes the key 
components that comprise a course which would introduce both ... 



Exploring data mi n ing i mple m en t ation 
Karim K. Hirji 

July 2001 Communications of the ACM, volume 44 issue 7 
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Karuna P. Joshi, Anupam Joshi, Yelena Yesha, Raghu Krishnapuram 

November 1999 Proceedings of the 2nd international workshop on Web information 

and data management WIDM '99 
Publisher: ACM Press 

Full text available- fH pdf(1 66 MB) Additional Information: full citation, abst r ac t, referenc es, citing s, index 
■ . _ terms 

Analyzing Web Logs for usage and access trends can not only provide important 
information to web site developers and administrators, but also help in creating adaptive 
web sites. While there are many existing tools that generate fixed reports from web logs, 
they typically do not allow ad-hoc analysis queries. Moreover, such tools cannot discover 
hidden patterns of access embedded in the access logs. We describe a relational OLAP 
(ROLAP) approach for creating a web-log warehouse. This is pop ... 
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^ August 1999 Proceedings of the fifth ACM SIGKDD international conference on 
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Publisher: ACM Press 

Full text available: ^pdfd .OS MB) Additional Information: full citation , references , citings , index terms 



5 Discovering Internet marketing intelligence through online analytical web usage 
mining 

^ Alex G. Buchner, Maurice D. Mulvenna 

December 1998 ACM SIGMOD Record, volume 27 issue 4 
Publisher: ACM Press 

Full text available: ^pdf( 772 . 0 6 KB) Additional Information: full citation, abs t r a ct, citi n g s , in dex te r ms 

This article describes a novel way of combining data nnlning techniques on Internet data in 
order to discover actionable nnarketing Intelligence in electronic commerce scenarios. The 
data that is considered not only covers various types of server and web meta information, 
but also marketing data and knowledge. Furthermore, heterogeneity resolution thereof 
and Internet- and electronic commerce-specific pre-processing activities are embedded. A 
generic web log data hypercube is formally defined ... 
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^ PMSonaliz^^^^^^^^ 
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August 2003 Proceedings of the ninth ACi^ SIGKDD international conference on 
Knowledge discovery and data mining KDD '03 

Publisher: ACM Press 

Full text available* "pi pdf(429 65 KB) ^^^'^'O'^^' Information: full citation, abstract, references, citings, Index 
* ^ '■ terms 

Web personalization is the process of customizing a Web site to the needs of each specific 
user or set of users, taking advantage of the knowledge acquired through the analysis of 
the user's navigational behavior. Integrating usage data with content, structure or user 
profile data enhances the results of the personalization process. In this paper, we present 
SEWeP, a system that makes use of both the usage logs and the semantics of a Web 
site's content in order to personalize it. Web content is ... 

Keywords: Web mining, Web personalization, concept hierarchies, semantic annotation 
of Web content 



Fast detection of connmunication patterns in distributed execu t ions 
Thomas Kunz, Michiel F. H. Seuren 

November 1997 Proceedings of the 1997 conference of the Centre for Advanced 
Studies on Collaborative research GASCON '97 

Publisher: IBM Press 

Full text available: ^ pdf(4.21 MB) Additional Information: full citation , abstract , references , index terms 

Understanding distributed applications is a tedious and difficult task. Visualizations based 
on process-tinne diagranns are often used to obtain a better understanding of the 
execution of the application. The visualization tool we use is Poet, an event tracer 
developed at the University of Waterloo. However, these diagrams are often very complex 
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and do not provide the user with the desired overview of the application. In our 
experience, such tools display repeated occurrences of non-trivial commun ... 

8 Contributed articles on online, interactive, and anytime data nnining: Requirements for 

^ clustering data streams 
^ Daniel Barbara 

January 2002 ACM SIGKDD Explorations Newsletter volume 3 issue 2 

Publisher: ACM Press 

Full text available: 5.pffl486,13_KB) Additional Information: full citation, abstract, ref erenc es, citings 

Scientific and industrial examples of data streanns abound in astronomy, 
telecommunication operations, banking and stock-market applications, e-commerce and 
other fields. A challenge imposed by continuously arriving data streams is to analyze 
them and to modify the models that explain them as new data arrives. In this paper, we 
analyze the requirements needed for clustering data streams. We review some of the 
latest algorithms in the literature and assess if they meet these requirements. 

Keywords: Data streams, clustering, outliers, tracking changing models 
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^ navi g ational p atterns 
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November 2004 Proceedings of the 6th annual ACM international workshop on Web 

information and data management WIDM '04 
Publisher: ACM Press 

Full text available* 1|l pdf(502 59 KB) A^^'^^^'^^' Information: full citation , abstract , references , citings, index 
*^ terms 

The amounts of information residing on web sites make users' navigation a hard task. To 
address this problem, web sites provide recommendations to the end users, based on 
similar users' navigational patterns mined from past visits. In this paper we introduce a 
recommendation method, which integrates usage data recorded in web logs, and the 
conceptual relationships between web documents. In the proposed framework, the usage- 
oriented URI representation of web pages and users' behavior Is augmen ... 

Keywords: concept hierarchies, semantic web mining, semantic web personalization, 
web content semantics 



10 Book reviews: Data mining: concepts and techniques by Jiawei Han and Micheline Q 
Kamber 

Fernando Berzal, Nicolfas Matm 
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Publisher: ACM Press 

Full text available: ^ pdf( 308.5 1 KB ) Additional Information: full c it ation 




11 DBMiner: a sy s t e m for data m ining in rel a tional databases and d ata wareho uses 
Jiawei Han, Jenny Y. Chiang, Sonny Chee, Jianping Chen, Qing Chen, Shan Cheng, Wan 
Gong, Micheline Kamber, Krzysztof Koperski, Gang Liu, Yijun Lu, Nebojsa Stefanovic, Lara 
Winstone, Betty B. Xia, Osmar R. Zaiane, Shuhua Zhang, Hua Zhu 
November 1997 Proceedings of the 1997 conference of the Centre for Advanced 

Studies on Collaborative research GASCON '97 
Publisher: IBM Press 
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Full text available:^ pdf(28Q.67 KB) Additional Information: full citation, abstract, references, citings, index 

terms 

A data mining system, DBMiner, has been developed for interactive mining of multiple- 
level knowledge in large relational databases and data warehouses. The system 
implements a wide spectrum of data mining functions, including characterization, 
comparison, association, classification, prediction, and clustering. By incorporating several 
interesting data mining techniques, including OLAP and attribute-oriented induction, 
statistical analysis, progressive deepening for mining multiple-level knowled ... 

12 Parallel data b ase pro c e s sing o n a 1 0 0 No d e P C clu ster: ca se s for decis i on supporl 
A query processing and data mining 

^ Takayuki Tamura, Masato Oguchi, Masaru Kitsuregawa 

November 1997 Proceedings of the 1997 ACM/IEEE conference on Supercomputing 

(CDROM) Supercomputing '97 
Publisher: ACM Press 

Full text available: ^ pdf( 157.74 KB) Additional Information: full citation , abstract , references , citings 

We developed a PC cluster systenn consists of 100 PCs. Each PC ennploys the 200MHz 
Pentiunn Pro CPU and is connected with others through an ATM switch. We picked up two 
kinds of data intensive applications. One is decision support query processing. And the 
other is data mining, specifically, association rule nnining.As a high speed network, ATM 
technology has recently come to be a de facto standard. While other high performance 
network standards are also available, ATM networks are widely used from ... 

1 3 Research t rack p oster : Privac y- preserv ing distribu ted k - me ans clustering ov er 
^ ar bit rar ily partitioned data 

^ Geetha Jagannathan, Rebecca N. Wright 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 

Knowledge discovery in data mining KDD '05 
Publisher: ACM Press 

Additional Information: f ull cit ation , a bstract , r efer e nce s, citings, index 



Full text available: W p.df(872,07 KB) 

terms 

Advances In computer networking and database technologies have enabled the collection 
and storage of vast quantities of data. Data nnlning can extract valuable knowledge from 
this data, and organizations have realized that they can often obtain better results by 
pooling their data together. However, the collected data may contain sensitive or private 
information about the organizations or their customers, and privacy concerns are 
exacerbated if data is shared between multiple organizations. Distri ... 

Stra t e gic d i rections i n e l e c tronic comme r ce and di gi tal lib raries: t owards a di g ita l 
^ aaora 

^ Nabil Adam, Yelena Yesha 

December 1996 ACM Computing Surveys (CSUR), volume 28 issue 4 
Publisher: ACM Press 

Full text available: ^ pdf(244.34 KB) Additional Information: full citation , references , citings , index terms 



15 Information management technology in Asia: InfiniteDB: a pc-cluster based parallel 

massive database management system 
Jianzhong Li, Hong Gao, Jlzhou Luo, Shengfei Shi, Wei Zhang 

June 2007 Proceedings of the 2007 ACM SIGMOD international conference on 
Management of data SIGMOD '07 

Publisher: ACM Press 

Full text available: Additional Information: 
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This paper describes a PC-cluster based parallel DBMS, InfiniteDB, developed by the 
authors. InfiniteDB aims at efficiently storing and processing of massive databases in 
response to the rapidly growing in database size and the need of high performance 
analyzing of massive databases. It supports the parallelisms of intra-query, inter-query, 
intra-operation, inter-operation and pipelining. It provides effective strategies for 
processing massive databases Including the multiple data declusterin ... 

Keywords: data declustering, parallel algorithm, parallel database, parallel query 
processing 



16 DB-4 (databases): similarity search: Localized signature table; fast similarity search |2 
on transaction data 

Qiang Jing, Rui Yang, Panos Kalnis, Anthony K. H. Tung 

November 2004 Proceedings of the thirteenth ACM international conference on 
Information and knowledge management CIKM '04 

Publisher: ACM Press 

Full text available:^ pdf(200. 77 KB) Additional Information: full citation, abstrad r e f e renc e s, index terms 

Recently, techniques for supporting efficient similarity search over huge transaction 
datasets have ennerged as an innportant research area. Several indexing schemes have 
been proposed towards this direction. Typically, these schemes provide a tradeoff 
between searching efficiency and indexing overhead in terms of space. 

In this paper, we propose a novel Indexing scheme for similarity search on transaction 
data. Based on well-studied clustering techniques, we develop a construction algor ... 

Keywords: data mining, indexing, similarity search, transaction data 




Extr a cting predicates from mining models for e fficient quei7 evaluation 

#Surajit Chaudhuri, Vivek Narasayya, Sunita Sarawagi 
September 2004 ACM Transactions on Database Systems (TODS), volume 29 issue 3 
Publisher: ACIVI Press 

Full text available: ^ pdf(698,37 Additional Information: full citation, abstract, references, index terms 

Modern relational database systems are beginning to support ad hoc queries on nnining 
models. In this article, we explore novel techniques for optimizing queries that contain 
predicates on the results of application of mining models to relational data. For such 
queries, we use the internal structure of the mining model to automatically derive 
traditional database predicates. We present algorithms for deriving such predicates for a 
large class of popular discrete mining models: decision trees, nai ... 

Keywords: Complex predicate optimization, simpler rules from complex predictive 
functions 



Web mining for web personalization 
Magdalini Eirinaki, Michalis Vazirgiannis 

February 2003 ACM Transactions on Internet Technology (TOIT), volume 3 issue 1 
Publisher: ACM Press 

,. X X I ui 0k -j4r/or.o -ro t^o\ Additional Information: full citation , abstract , references , citing s, index 

Full text available: Tfil paf(29373 KB) : 

^ terms , review 

Web personalization is the process of custonnizing a Web site to the needs of specific 
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users, taking advantage of the knowledge acquired from the analysis of the user's 
navigational behavior (usage data) in correlation with other information collected in the 
Web context, namely, structure, content, and user profile data. Due to the explosive 
growth of the Web, the domain of Web personalization has gained great momentum both 
In the research and commercial areas. In this article we present a survey ... 

Keywords: WWW, Web personalization, Web usage mining, user profiling 



B us iness i n tellig e nce: Developing a characterization of busin e ss in te lligence 
^ workloads fo r s i zin g ne w d a tabase s ystems 
^ Ted J. Wasserman, Patrick Martin, David B. Skillicorn, Haider Rizvi 

November 2004 Proceedings of the 7th ACM international workshop on Data 
warehousing and OLAP DOLAP '04 

Publisher: ACM Press 

Full text available: 1fl pdf(404.55 KB) Additional Information: M.cMm. abMract, fMerences. index M^^^^^ 
^ ■ " r e v iew 

Computer system sizing Involves estimating the amount of hardware resources needed to 
support a new workload not yet deployed in a production environment. In order to 
determine the type and quantity of resources required, a methodology is required for 
describing the new workload. In this paper, we discuss the sizing process for database 
management systems and describe an analysis for characterizing business intelligence 
(BI) workloads, using the TPC-H benchmark as our workload basis. The char ... 

Keywords: business intelligence, capacity planning, clustering, sizing, workload 
characterization 



20 Short papers: COFI approach for mining frequent itemsets revisited 
A. Mohammad El-Hajj, Osmar R. Zai'ane 

>^ June 2004 Proceedings of the 9th ACM SIGMOD workshop on Research issues in data 
mining and knowledge discovery DMKD '04 

Publisher: ACM Press 

Full text available; ^ pdf(267.02 KB) Additional Information: full citation , abstract , references 

The COFI approach for mining frequent itennsets, introduced recently, Is an efficient 
algorithnn that was demonstrated to outperform state-of-the-art algorithms on synthetic 
data. For instance, COFI is not only one order of magnitude faster and requires 
significantly less memory than the popular FP-Growth, it is also very effective with 
extremely large datasets, better than any reported algorithm. However, COFI has a 
significant drawback when mining dense transactional databases which Is the case ... 
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